Szymon Grabowski

نویسندگان

  • SZYMON GRABOWSKI
  • SEBASTIAN DEOROWICZ
چکیده

Web log data store client activity on a particular server, usually in form of one-line “hits” with information like the client’s IP, date/time, requested file or query, download size in bytes etc. Web logs of popular sites may grow at the pace of hundreds of megabytes a day, or even more. It makes sense to archive old logs, to analyze them further, e.g. for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. The test results show the proposed transform improves the compression efficiency of general-purpose compressors on average by 65% in case of gzip and 52% in case of bzip2.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A general compression algorithm that supports fast searching

The task of compressed pattern matching [2] is to report all the occurences of a given pattern P in a text T available in compressed form. Certain compression algorithms allow for searching without prior decoding which may be practical, especially if the search is faster than in the non-compressed representation. Most of the known schemes, however, either assume a text formed into words, or are...

متن کامل

Multiple Pattern Matching Revisited

We consider the classical exact multiple string matching problem. Our solution is based on q-grams combined with pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of the various alternatives of how to achieve best combination. Our method is closely related to previous work by (Salmela et al., 2006). The experimental results show that our method p...

متن کامل

Preprocessing for Real-Time Handwritten Character Recognition

We present a real-time on-line handwritten character recognition system , based on an ensemble of neural networks. In this work we focus on the developed preprocessing algorithms which help achieve high accuracy rate without a visible delay in recognition process.

متن کامل

Simple Techniques for Plagiarism Detection in Student Programming Projects

In this paper we deal with the stealing program code problem. The specific of plagiarism attempts concerning the work of a programmer is that in most programming languages it is very easy to change the “look” of a piece of code without changing its semantics at all. Basically, plagiarism detection algorithms look at either the code structure or just specific phrases. We experiment with the latt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007